AITopics | exploration value

Collaborating Authors

exploration value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EVOLvE: Evaluating and Optimizing LLMs For Exploration

Nie, Allen, Su, Yi, Chang, Bo, Lee, Jonathan N., Chi, Ed H., Le, Quoc V., Chen, Minmin

arXiv.org Artificial IntelligenceOct-8-2024

Despite their success in many domains, large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. This is crucial as many real-world applications, ranging from personalized recommendations to healthcare interventions, demand that LLMs not only predict but also actively learn to make optimal decisions through exploration. In this work, we measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. We develop a comprehensive suite of environments, including both context-free and contextual bandits with varying task difficulties, to benchmark LLMs' performance. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs: by providing explicit algorithm-guided support during inference; and through algorithm distillation via in-context demonstrations and fine-tuning, using synthetic data generated from these algorithms. Impressively, these techniques allow us to achieve superior exploration performance with smaller models, surpassing larger models on various tasks. We conducted an extensive ablation study to shed light on various factors, such as task difficulty and data representation, that influence the efficiency of LLM exploration. Additionally, we conduct a rigorous analysis of the LLM's exploration efficiency using the concept of regret, linking its ability to explore to the model size and underlying algorithm.

algorithm, exploitation value, exploration value, (14 more...)

arXiv.org Artificial Intelligence

2410.06238

Country:

North America > United States > Arkansas > Pulaski County (0.04)
North America > United States > New Jersey > Camden County (0.04)
North America > United States > Kentucky > Jefferson County (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (0.47)
Health & Medicine (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ReVoLT: Relational Reasoning and Voronoi Local Graph Planning for Target-driven Navigation

Liu, Junjia, Guo, Jianfei, Meng, Zehui, Xue, Jingtao

arXiv.org Artificial IntelligenceJan-10-2023

Embodied AI is an inevitable trend that emphasizes the interaction between intelligent entities and the real world, with broad applications in Robotics, especially target-driven navigation. This task requires the robot to find an object of a certain category efficiently in an unknown domestic environment. Recent works focus on exploiting layout relationships by graph neural networks (GNNs). However, most of them obtain robot actions directly from observations in an end-to-end manner via an incomplete relation graph, which is not interpretable and reliable. We decouple this task and propose ReVoLT, a hierarchical framework: (a) an object detection visual front-end, (b) a high-level reasoner (infers semantic sub-goals), (c) an intermediate-level planner (computes geometrical positions), and (d) a low-level controller (executes actions). ReVoLT operates with a multi-layer semantic-spatial topological graph. The reasoner uses multiform structured relations as priors, which are obtained from combinatorial relation extraction networks composed of unsupervised GraphSAGE, GCN, and GraphRNN-based Region Rollout. The reasoner performs with Upper Confidence Bound for Tree (UCT) to infer semantic sub-goals, accounting for trade-offs between exploitation (depth-first searching) and exploration (regretting). The lightweight intermediate-level planner generates instantaneous spatial sub-goal locations via an online constructed Voronoi local graph. The simulation experiments demonstrate that our framework achieves better performance in the target-driven navigation tasks and generalizes well, which has an 80% improvement compared to the existing state-of-the-art method. The code and result video will be released at https://ventusff.github.io/ReVoLT-website/.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2301.02382

Country:

Europe > United Kingdom > England > Tyne and Wear > Sunderland (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.70)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Intrinsic Exploration as Multi-Objective RL

Morere, Philippe, Ramos, Fabio

arXiv.org Machine LearningApr-5-2020

Intrinsic motivation enables reinforcement learning (RL) agents to explore when rewards are very sparse, where traditional exploration heuristics such as Boltzmann or ɛ-greedy would typically fail. However, intrinsic exploration is generally handled in an ad-hoc manner, where exploration is not treated as a core objective of the learning process; this weak formulation leads to sub-optimal exploration performance. To overcome this problem, we propose a framework based on multi-objective RL where both exploration and exploitation are being optimized as separate objectives. This formulation brings the balance between exploration and exploitation at a policy level, resulting in advantages over traditional methods. This also allows for controlling exploration while learning, at no extra cost. Such strategies achieve a degree of control over agent exploration that was previously unattainable with classic or intrinsic rewards. We demonstrate scalability to continuous state-action spaces by presenting a method (EMU-Q) based on our framework, guiding exploration towards regions of higher value-function uncertainty. EMU-Q is experimentally shown to outperform classic exploration techniques and other intrinsic RL methods on a continuous control benchmark and on a robotic manipulator.

artificial intelligence, exploration, upstream oil & gas, (18 more...)

arXiv.org Machine Learning

2004.0238

Country: Oceania > Australia (0.14)

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas > Upstream (0.70)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Optimization with Model-based Explorations

Pan, Feiyang, Cai, Qingpeng, Zeng, An-Xiang, Pan, Chun-Xiang, Da, Qing, He, Hualin, He, Qing, Tang, Pingzhong

arXiv.org Machine LearningNov-18-2018

Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning. In this paper, we present a new technique to address the trade-off between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Model-based Explorations (POME). POME uses two components to predict the actions' target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games.

computer game, pome, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1811.0735

Country: Asia > China (0.29)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (0.55)
Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback